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Abstract 

The Graph Motif problem was introduced in 2006 in the context of biological networks. 
It consists of deciding whether or not a multiset of colors occurs in a connected subgraph 
of a vertex-colored graph. Graph Motif has been mostly analyzed from the standpoint 
of parameterized complexity. The main parameters which came into consideration were the 
size of the multiset and the number of colors. In the many utilizations of Graph Motif, 
however, the input graph originates from real-life applications and has structure. Motivated by 
this prosaic observation, we systematically study its complexity relatively to graph structural 
parameters. For a wide range of parameters, we give new or improved FPT algorithms, or show 
that the problem remains intractable. For the FPT cases, we also give some kernelization 
lower bounds as well as some ETH-based lower bounds on the worst case running time. 
Interestingly, we establish that Graph Motif is W[l]-hard (while in W[P]) for parameter 
max leaf number, which is, to the best of our knowledge, the first problem to behave this way. 


1 Introduction 

The Graph Motif problem has received a lot of attention during the last decade. Informally, 
Graph Motif is defined as follows: given a graph with arbitrary colors on the nodes and a 
multiset of colors called the motif, the goal is to decide if there exists a subset of vertices of the 
graph such that (1) the subgraph induced by this subset is connected and (2) the colors on the 
subset of vertices match the motif, i.e. each color appears the same number of times as in the motif. 
Originally, this problem is motivated by applications in biological network analysis [34j . However, 
it also proves useful in social or technical networks [5] or in the context of mass spectrometry [8j. 

Studying biological networks allows a better characterization of species, by determining small 
recurring subnetworks, often called motifs. Such motifs can correspond to a set of nodes realizing 
some function, which may have been evolutionary preserved. Thus, it is crucial to determine 
these motifs to identify common elements between species and transfer the biological knowledge. 
Graph Motif corresponds to topology-free queries and can be seen as a variant of a graph pattern 
matching problem with the sole topological requirement of connectedness. Such queries were also 
studied extensively for sequences during the last thirty years, and with the increase of knowledge 
about biological networks, it is relevant to extend these queries to networks [4l] . 


2 Preliminaries and previous work 

For any two integers x < y, we set [x, y\ := {x, x + 1,..., y — 1 ,y}, and for any positive integer x, 
[x] := [1, x]. If G is a graph, we denote by V ( G ) its set of vertices and by E(G) its set of edges. If 
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G = (V, E) is a graph and S C V, Eg(S) denotes the subset of edges of E having both endpoints 
in S. If G = (V, E) is a graph and S C V is a subset of vertices, G[iS] denotes the subgraph of G 
induced by S: (S, Eg{S)). For a vertex v £ V, the set of neighbors of v in G is denoted by Ng{v), 
and Ng{S) := (U„es Ng(v)) \ S. We define JVg[i;] := Ng{v ) U {e} and A'G'fS'] := Ng(S) U S. In 
all the previous definitions, we will lose the subscript g whenever the graph G we are referring to 
is either implicit or irrelevant. We say that a vertex v dominates a set of vertices S if S C N[v]. 
A set of vertices R dominates another set of vertices S if S C N [i?]. If G = (V, E) is a graph and 
V' C V, G — V' denotes the graph G\V \ V']. A universal vertex v, in a graph G = (V,E), is 
such that Ag[u] = V. A matching of a graph is a set of mutually disjoint edges. In an explicitly 
bipartite graph G = (Vi U I Ai,E), we call a matching of size min(|Vi|, | V 2 1) a perfect matching. A 
cluster graph (or simply, cluster ) is a disjoint union of cliques. A co-cluster graph (or, co-cluster ) 
is the complement graph of a cluster graph. If C is a class of graphs, the distance to C of a graph 
G is the minimum number of vertices to remove from G to get a graph in C. 

If / : A —> B is a function and A! C A, f ^ denotes the restriction of / to A’, that is 
f\A' : A' —>■ B such that Vx £ A', fjA'(x) ~ /(x). Similarly, if E is a set of edges on vertices of V 
and V' C V , Ej V / is the subset of edges of E having both endpoints in V'. 

Multisets. A multiset is a generalization of the notion of set where each element may appear 
more than once. The multiplicity of the element x in the multiset M, denoted by vim{x), is the 
number of occurrences of x in M. We adopt the natural convention that niM{x) = 0 if x does 
not belong to M. The cardinality of a multiset M denoted by \M\ is its number of elements 
with their multiplicity: E x mM(x). If M and N are two multisets, M U N is the multiset A 
such that Vx, to^(x) = mM(x) + m^ix), and M \ N is the multiset D such that Vx, tod(x) = 
max(0, tom(^) — mjv(x)). We write M C N if and only if M \ N = 0 and M C N if and only if 
M C TV and M^N. 

Example 1. Let M = {1, 2, 2,4, 5, 5, 5} and N = {1,1,1, 2, 2, 3,3,4, 5, 5, 5, 5}. Then, \M\ = 7, 
\N\ = 12, M\N = ®, N\M = {1,1,3, 3, 5}, and M C N. 

Graph Motif. The problem is defined as follows: 


Graph Motif 

• Input: A triple ( G,c,M ), where G = (V,E) is a graph, c : V —> C is a coloring of the 
vertices, and M is a multiset of colors of C. 

• Output: A subset R CV such that 

(1) G[i?] is connected and 

(2) c(R) = M. _ 


In the above definition, c(R) denotes the multiset of colors of vertices in R. We use that slight 
abuse of notation for convenience. We will refer to condition (1) as the connectivity constraint 
and to condition (2) as the multiset constraint. 

Parameterized Complexity. A parameterized problem (I, k) is said fixed-parameter tractable 
(or in the class FPT) w.r.t. (with respect to) parameter k if it can be solved in f{k) ■ \I\ C time (in 
fpt-time), where / is any computable function and c is a constant (see [2I)I321E] for more details 
about fixed-parameter tractability). The parameterized complexity hierarchy is composed of the 
classes FPT C W[l] C W[2] C • • • C W[P] C XP. The class XP is the set of problems solvable in 
time where / is a computable function. 

A W[l]-hard problem is not fixed-parameter tractable (unless FPT = W[l]) and one can prove 
W[l]-hardness by means of a parameterized reduction from a W[l]-hard problem. This is a mapping 
of an instance (/, k ) of a problem A\ in g(k) ■ |X| 0 ( 1 ) time (for any computable function g) into an 
instance (/', k') for A% such that (/, k) £ A\ (/', k') £ A 2 and k' < h{k) for some function h. 

A powerful technique to design parameterized algorithms is kernelization. In short, kerneliza- 
tion is a polynomial-time self-reduction algorithm that takes an instance (/, k) of a parameterized 
problem P as input and computes an equivalent instance {!',k') of P such that \I'\ ^ h(k) for 
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some computable function h and k' ^ k. The instance (/', k') is called a kernel in this case. If the 
function h is polynomial, we say that (/', k') is a polynomial kernel. 

It is well known that a decidable problem is in FPT if and only if it has a kernel, but this 
equivalence yields super-polynomial kernels (in general). To design efficient parameterized algo¬ 
rithms, a kernel of polynomial (or even linear) size in k is important. However, some lower bounds 
on the size of the kernel can be shown under the assumption that the polynomial hierarchy is a 
proper hierarchy. To show such results, we will use the cross-composition technique developed by 
Bodlaender et al. [9]. 

Definition 2 (Polynomial equivalence relation |9]). An equivalence relation 1Z on £* is said to 
be polynomial if the following two conditions hold: 

(i) There is an algorithm that given two strings x,y £ £* decides whether x and y belong to the 
same equivalence class in time (|x| + \y\)°^ ■ 

(ii) For any finite set S C £* the equivalence relation 1Z partitions the elements of S into at most 
(max^gg |rr|) c ’^ 1 - ) classes. 

Definition 3 (OR-cross-composition [[§]). Let L C £* be a set and let Q C £* x N be a parame¬ 
terized problem. We say that L cross-composes into Q if there is a polynomial equivalence relation 
IZ and an algorithm which, given t strings x\,x%,... ,Xt belonging to the same equivalence class of 
7 Z, computes an instance ( x*,k *) £ £* x N in time polynomial in \ x i\ suc h that: 

(i) (x*, k*) £ Q ■<=> Xi £ L for some 1 ^ i ^ t; and 

(ii) k* is bounded by a polynomial in max* =1 \xi\ + logf. 

Theorem 4 ([5]). Let L C £* be a set which is NP -hard under Karp reductions. If L cross- 
composes into the parameterized problem Q, then Q has no polynomial kernel unless NP C 
coNP /poly. 

(Strong) Exponential Time Hypothesis. The Exponential Time Hypothesis (ETH) is a 
conjecture by Impagliazzo et al. [30] asserting that there is no 2°( n )-time algorithm for 3-SAT 
on instances with n variables. The so-called sparsification lemma, also proved in [30], shows that 
if ETH turns out to be true, then there is no 2°( n+m )-time algorithm solving 3-SAT where m 
is the number of clauses. The Strong Exponential Time Hypothesis (SETH) by Impagliazzo and 
Paturi [29] further asserts that, for every <5 < 1, there is an integer k such that fc-SAT cannot be 
solved in time 0(2 5n ). Cygan et al. showed that, assuming SETH, for any S < 1, some problems 
such as Hitting Set could not be solved in time 0(2 Sn ) either [16] . where n is the number of 
elements. The authors also conjectured that the same result should hold for the Set Cover 
problem, and gave some supporting pieces of evidence. We will refer to the assumption that, for 
any <5 < 1, Set Cover instances with n elements cannot be solved in time 0(2 Sn ) as SCH (for 
Set Cover- hardness). We insist on the fact that the implication SETH => SCH is not known 
yet. 

Previous work. Many results about the complexity of Graph Motif are known. The 
problem is NP-liard even with strong restrictions. For instance, it remains NP-hard for bipartite 
graphs of maximum degree 4 and motifs containing two colors only [22] . or for trees of maximum 
degree 3 and when the motif is colorful (that is, no color occurs more than once) [22], or for rooted 
trees of depth 2 [2]. However, the problem is solvable in polynomial time when the graph is a 
caterpillar [2] , or when both the number of colors in the motif and the treewidth of the graph are 
bounded by a constant [221 . 

As Graph Motif is intractable even for very restricted classes of graphs, and considering 
that, in practice, the motif is supposed to be small compared to the graph, the parameterized 
complexity of Graph Motif relatively to the size of the motif has been tackled. It is indeed in 
FPT when parameterized by the size of the motif. At least seven different papers gave an FPT 
algorithm [22] 2[ (25[ [33103 HU SO] ■ The best (randomized) algorithm runs in time 0*(2 k ) where 
the O* notation suppresses polynomial factors Bam] and works well in practice for small values of 
k, even with hundreds of millions of edges [B]. The current best deterministic algorithm takes time 
0*(5.22 fe ) [40] . However, an algorithm running in time 0*((2 — e) k ) would break the 2 n barrier 
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in solving Set Cover instances with n elements (that is, would disprove SCH) [5]. Besides, it is 
unlikely that Graph Motif admits a polynomial kernel, even on a restricted class of trees [2]. 
Ganian proved that Graph Motif is in FPT when the parameter is the size of a minimum vertex 
cover of the graph |25] . Actually, his algorithm is given for a smaller parameter called twin-cover. 
Ganian also showed that Graph Motif can be solved in 0*{2 k ) for graphs with neighborhood 
diversity k [26j . On the negative side, the problem is W[l]-hard with respect to the number of 
colors, even for trees [22]. To deal with the huge rate of noise in the biological data, many variants 
of the problem has been introduced. For example, the approach of Dondi et al. requires a solution 
with a minimum number of connected components [201 , while the one of Betzler et al. asks for a 
2-connected solution [4j. In other variants stemming purely from bio-informatics, some colors can 
be added to, substituted or subtracted from the solution mm- 

In light of the previous paragraphs, it is clear that the complexity of Graph Motif is well 
known for different versions and constraints on the problem itself. However, only few works take 
into account the structure of the input graph. We believe that this an interesting direction since 
Graph Motif has applications in real-life problems, where the input is not random. For example, 
some biological networks have been shown scale-free or with small diameter |T] . We will therefore 
introduce a systematic study with respect to structural graph parameters [32] [23]. We believe that 
this is also of theoretical interest, to understand how a given parameter influences the complexity 
of the problem. 

Our contribution. In Section [3] we improve the known FPT algorithms with parameter 
distance to clique, vertex cover number, and edge clique cover number. We also give a parame¬ 
terized algorithm for the parameter distance to co-cluster which nicely reuses the FPT algorithms 
for both vertex cover number and distance to clique and another algorithm for parameter vertex 
clique cover number. These last two algorithms are noteworthy since a bounded distance to co¬ 
cluster or a bounded vertex clique cover number do not imply a bounded neighborhood diversity, 
a parameter for which Graph Motif was already known to be in FPT. We also show that a 
polynomial kernel for the aforementioned parameters is unlikely and give some ETH-based lower 
bounds for the worst case running time. In Section [4] we show that Graph Motif remains hard 
on graphs of constant distance to disjoint paths, or constant bandwidth, or constant distance to 
cluster, or constant dominating set number. More surprisingly, we establish that Graph Motif 
is W[l]-hard (but in W[P]) for the parameter max leaf number. To the best of our knowledge, 
there is no previously known problem behaving similarly when parameterized by max leaf number. 
Indeed, graphs with bounded max leaf number are really simple and, for instance, all the problems 
studied in [23] are FPT for this parameter. These positive and negative results draw a tight line 
between tractability and intractability (see Figure [TJ. 

3 FPT algorithms, kernelization and ETH-based lower bounds 

In this section, we improve or establish new FPT algorithms for several parameters. We comple¬ 
ment those algorithms with some lower bounds under ETH, SETH, and SCH. We also give a lower 
bound on the size of the kernel for all those parameters except cluster editing number. Figure [T| 
summarizes those results. 

3.1 Cluster editing and linear neighborhood diversity 

The cluster editing number of a graph is the number of edge deletions or additions required to 
get a cluster graph. It can be computed in time 0*(1.62 fe ) [7]. We will use a known result 
involving another parameter called neighborhood diversity introduced by Lampis [35| . A graph 
has neighborhood diversity k if there is a partition of its vertices into at most k sets such that 
all the vertices in each set have the same type. And, two vertices u and v have the same type 
if N(v) \ {u} = N(u) \ {u}. We say that a graph parameter n has linear (resp. exponential) 
neighborhood diversity if, for every positive integer k, all the graphs G such that k(G) ^ k have 
neighborhood diversity 0(k ) (resp. 2We say that a parameter n has unbounded neighborhood 
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FPT but no polynomial-size kernel 
unless 

NP C col 


W[l|-hard, in W[P] 


3 f Distance 'l ~ 'll 

MP/voi d to clique . j [Vertex Cove, . j 



Figure 1: Hasse diagram of the relationship between different parameters ( 132]). Two param¬ 
eters are connected by a line if the parameter below can be polynomially upper-bounded in the 
parameter above. For example, vertex cover is above distance to disjoint paths since deleting a 
vertex cover produces an independent set, hence a set of disjoint paths. Therefore, positive results 
propagate upwards, while negative results propagate downwards. Results marked by <0> are ob¬ 
tained in this paper, those marked with • are improvement of existing results, and those marked 
with * are corollaries of existing results. Parameter neighborhood diversity is not depicted since 
its relations with vertex cover may be exponential. We refer to [35j Figure 1] for a diagram with 
neighborhood diversity. We note that neighborhood diversity would be below vertex cover, not 
comparable to feedback vertex set, patwidth or treewidth, but above cliquewidth (this last would 
be below treewidth). 


diversity , if there is no function / such that all graphs G with n(G) ^ k have neighborhood 
diversity f{k). 

Theorem 5 ([26]). Graph Motif can he solved in 0*(2 k ) on graphs with neighborhood diversity 
k. 


The following result is a direct consequence of the fact that, restricted to connected graphs, 
cluster editing has linear neighborhood diversity. 

Corollary 6. Graph Motif can be solved in 0*(8 k ), where k is the cluster editing number. 

Proof. Let (G = (V, E),c, M) be any instance of Graph Motif. We can assume that G is 
connected, otherwise we run the algorithm in each connected component of G. Let X be the set 
of vertices which are an endpoint of an edited edge (deleted or added) and let G' be the cluster- 
graph obtained by the k edge editions. We may observe that |X| ^ 2k and that the number of 
maximal cliques C\, ... ,Ci in G' is bounded by k (otherwise, G could not be connected). For 
each i £ [Z], and for each vertex v £ C'i \ X , N[v] = Ci. Thus the neighborhood diversity of G 
is bounded by |X| + l ^ 2k + k = 3 k. So, we can run the algorithm for bounded neighborhood 
diversity [26] and it takes time 0*(2 3k ). □ 

3.2 Parameters with exponential neighborhood diversity 

The next three parameters that we consider are distance to clique , size of a minimum vertex 
cover, and size of a minimum edge clique cover. For the first two, a value of k entails that the 
neighborhood diversity is at most k + 2 fc ; whereas, edge clique cover number k implies that the 
neighborhood diversity is at most 2 k . Therefore, Ganian has already given an algorithm running 
in double exponential time for these parameters (0*(2 k+2> ') or 0*(2 2k ), see Theorem [5] [251 !2B] ). 
We improve this bound to single exponential time 2(more precisely 0*(3 k )) for distance to 
clique and to 2°^ logfc ' for the vertex cover and edge clique cover numbers. The latter running 
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time is sometimes called slightly superexponential FPT time |3Bj. Then, we prove that for each of 
those three parameters, a polynomial kernel is unlikely. 

As a preparatory lemma for the algorithm parameterized by distance to clique, we show that 
a variant of Set Cover with thresholds is solvable in time 0*( 2"), where n is the size of the 
universe. In the problem that we call here Colored Set Cover with Thresholds, one is 
given a triple (U, S = C\ W... WC;, (a i,..., ai )) where U is a ground set of n elements, S is a set of 
subsets of U partitioned into l classes called colors and (a i,... ,a{) is a tuple of l positive integers 
called threshold vector. The goal is to find a set cover T C S (not necessarily minimum) such that 
for each * £ [ l ], the number of sets with color i (that is, in Cf) in T is at most a^. 

Lemma 7. Colored Set Cover with Thresholds with n elements and m sets can be solved 
in time 0(nm2 n + nm). 


Proof. We order the sets of S such that sets of the same color appear consecutively, say, first 
the sets of C\, then the sets of C 2 , and so on. The order within the sets of a same color is not 
important and is chosen arbitrarily. We denote the sets resultantly ordered by Si,..., S m and 
function c maps the index of a set to its color. Therefore, c(j) = i means that set Sj has color i 
(Sj £ Ci). We fill by dynamic programming the table T, where T[U,j] is meant to contain the 
minimum number of sets in C c (j) among any subset of {Si ,..., Sj} that covers U CU and respects 
the threshold vector. 

As an initialization step, for each U CIA, we set T[U, 1] = 1 if U C Si, and T[U, 1] = 00 
otherwise. For each j £ [2 ,m\, assuming that T[U',j — 1] was already filled for every U' C IA, we 
distinguish two cases to fill T[U,j]. If Sj is the first set of the color class C c (j) then: 

f 0 if T[U,j — 1] < 00 (* discard Sj *) 

T[U,j] = < 1 if T[U,j - 1] = 00 and T[U \ Sj, j - 1] < 00 (* add Sj *) 

I 00 otherwise 


Otherwise Sj is not the first set in and: 


T[U,j- 1] 

v + 1 if v < a c (j) and 00 otherwise 
with v = T[U \ Sj,j — 1]. 


T[U,j] = min 


(* discard Sj *) 
(* add Sj *) 


A standard induction shows that the instance is positive if and only if T[U, m] 00 . The only 
costly operation in filling one entry of table T is the set difference which can be done in 0(n) 
time. If we want to produce an actual solution (and not solely decide the problem), we can add 
one bit in each entry T[U,j\ signaling whether or not Sj should be taken. Should the instance be 
positive, it then takes time 0{nm) to reconstruct a solution from a filled table T. Therefore, the 
running time is 0(n|T| + nm) = 0{nm2 n + nm). □ 


Theorem 8. Graph Motif can be solved in 0*{3 k ), where k is the distance to clique. 


Proof. Let (G = (V, E), c : V —> C,M) be any instance of Graph Motif and assume I? is a 
solution, that is G[i?] is connected and c(R) = M. If there is no solution, our algorithm will 
detect this eventually. We first compute a set S C V of size k such that C := V \ S is a clique. 
This can be done in time O* (2 k ) by branching over the two endpoints of a non-edge, or even in time 
0*(1.2738 fc ) by applying the state-of-the-art algorithm for Vertex Cover on the complementary 
graph m- Running through all the 2 k subsets of S, one can guess the subset S' = RC\S of S which 
is in the solution R. Let Si, S 2 , ■ ■ ■, Sy be the k! ^ k connected components of G[S"]. It must hold 
that c(S') C M, otherwise R would not be a solution. Now, the problem boils down to finding a 
non-empty (an empty subset would mean that S' = R which can be easily checked) subset C C C 
such that G[S" U C'] is connected and c(C') C M \ c(S'). Then, the set S' U C' can be extended 
into a solution by adding vertices of C\C' with the right colors. The graph GfS'UC'] is connected 
if and only if each connected component Sj of G[S'] has at least one neighbor in N(C'). We build 
an equivalent instance of Colored Set Cover with Thresholds in the following way. The 
ground set U is of size k! with one element Xj per connected component Sj of G[S"]. For each vertex 
v in C colored by i, there is a set S v colored by i such that Xj £ S v if and only if N(v)C\Sj 0. For 
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each color i, the threshold a, is set to the multiplicity of i in M\c(S'). The number of elements is 
k' and the number of sets is polynomial. So, it takes time 0*(2 k ) to solve this instance. Therefore, 
the overall running time is 0*(2 k + £ 2l s 'l) = 0*(2 k + £ ( k )2 k ') = 0*{3 k ). □ 

S':S'CS O^k'^k 

Theorem 9. Graph Motif can be solved in 0*( 2 2fclo s fc ) on graphs with a vertex cover of size 

k. 



Figure 2: The subsets of V relevant to the algorithm of Theorem [9] 

Proof. We start similarly to the previous algorithm. We compute a minimum vertex cover S of G 
in time 0*(2 k ) (or 0*(1.2738 fc ) [15]), and then guess in time 0*(2 k ) the subset S' = SOR, where 
R is a fixed solution. Again, we denote by Si, S 2 , • • ■, Sy the connected components of G[S"]. We 
remove c(S') from the motif and we remove from V the set I' of the vertices of the independent 
set I :=V\S which have no neighbor in S' (see Figure[2|. Now, by the transformation presented 
in the algorithm parameterized by distance to clique, the problem could be made equivalent to a 
constrained version of Colored Set Cover with Thresholds where the intersection graph 
(with an edge between two sets if they have a non-empty intersection) of the solution has to be 
connected. Unfortunately, it is not clear whether or not this variant can be solved in time 2°( n \ 
Thus, at this point, we have to do something different. 

Let Rd = {r\, 7 - 2 , •. ■, r{\ C R \ S' be a minimal (inclusion-wise) set of vertices such that 
G[S" U Rd] is connected. We can observe that l < k! < k. We guess in time 0*{l\B{) (where 
Bi is the Z-th Bell number, i.e., the number of partitions of a set of size l) an ordered partition 
P := (Ai, A 2 ,..., Ai) of the connected components {Si ,... ,Sy} such that, for each i £ [l], (1) 
ri has at least one neighbor in each connected component of Ai and (2) if i p 2, ri has at least 
one neighbor in a connected component of Ui< j<i A j- Note that such an ordered partition always 
exists since G[S' U Rd] is connected. Now, we build the bipartite graph B = (P U M',F ), where 
M' = M \ c(S') and there is an edge between Ai £ P and each copy of color c £ M' if and only 
if there is a vertex v £ I colored by c in the original graph G and such that (1) v has at least 
one neighbor in each connected component of Ai and (2) if i ^ 2, v has at least one neighbor 
in a connected component of Ui^j<i Ar By construction, {{Aj,c(ri)} | i £ [Z]} is a maximum 
matching of size |P| = l in graph B. Thus, we compute in polynomial time a maximum matching 
{{Ai,Ci} | i £ [Z] } in B. Then, we obtain a solution to the Graph Motif instance by taking, 
for each i £ |Z] any vertex v- L colored by Cj and having (1) at least one neighbor in each connected 
component of A,; and (2) if i ^ 2, at least one neighbor in a connected component of Ui<j<j Ar 
This can also be done in polynomial time and the existence of such a Vi is guaranteed by the 
construction of graph B. Then, we complete set S' U U,;em { v i} into a solution by taking any 
vertices in I \ I' with the right colors. As l\ ^ l l , Bi ^ (|) ; (even Bi < ) l 0), and l ^ k 

the overall running time is 0*{2 k + 2 k k\B k ) = 0*(k k k k ) = 0*( 2 2felo s fc ). □ 

In the Edge Clique Cover problem, one asks, given a graph G = (V, E) and an integer k, 
for k subsets Ci,..., Ck C V, such that Vi £ [k], G[Cj] is a clique, and Ve £ E, e lies in a clique 
Ci for some i £ [k]. The set {Ci,...,Gfc} is called an edge clique cover of G. The edge clique 
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cover number of a graph G is the smallest k such that G has an edge clique cover of size k. Edge 
Clique Cover admits a kernel of size 2 k (which can be obtained in 0(n 4 ) time) [27] and, as 
observed in m, it can be solved by dynamic programming in time 2°( n+m b Therefore, it can be 
solved in time 2 °( 2 +2 ) + 0(n 4 ), that is 2 2 + 0(n 4 ). On the negative side, Edge Clique 

Cover cannot be solved in time 2 2 ° {k) under ETH m- But, we may imagine that the instance 
comes with an optimal or close to optimal edge clique cover, or that we have a good heuristic to 
compute it (a polynomial time approximation with sufficiently good ratio is unlikely E3)- 

Theorem 10. Graph Motif can be solved in time 2 2 ° ( ' + 0(n 4 ), where k is the edge clique 
cover number, and in time O* ( 2 2klogk+k ) if an edge clique cover of size k is given as paid of the 
input. 

Proof. Let I = (G = (V, E), c, M) be any instance of Graph Motif. If not given, we first 
compute an edge clique cover {Ci,..., C k j of size k in G, in time 2 2 ° (k) + 0(n 4 ) [27, 15]. 

We guess in time 0 *( 2 fc ) the exact subset {G(,..., C' k ,} C {Ci,..., C k } of cliques Ci such that 
Ci PI R is non-empty, for a fixed solution R. Now, we turn the instance into an equivalent instance 
where the motif has size \M\ + k' and the graph has at most | V| + k' vertices and a vertex cover of 
size k'. The new graph is a bipartite graph B = (AuW, F) such that A contains one vertex v(C') 
per clique C[ (so, A is a vertex cover of graph B of size k' < k),W = C[ U... U C' k , C V, and there 
is an edge in F between v(C') £ A and w £ W if and only if w £ C[. Each vertex in W keeps the 
color it had in G. A fresh color 7 is given to the k! vertices of A , and color 7 is added to the motif 
M with multiplicity k'. This coloring is denoted by d and M’ := M U { 7 ,... ,7 (k 1 times)}. We 
run on the instance I' = (B, d , M') the algorithm parameterized by the vertex cover number of 
Theorem[9] This algorithm has an overall running time of O*(2 k 2 2klogk ), if the edge clique cover 
is given, and 2 2 ° lM + 0 (n 4 ) otherwise. 

We now explain why the reduction is correct. We first claim that the set A U R is a solution 
for the instance V. The colors of A U R consist of k 1 occurences of 7 plus the colors of M which 
matches the multiset M'. Now, we show that B[A\J R] is connected by reporting a path from any 
pair x, y of vertices in A U R. Let ip : A U R —> R be the identity function when restricted to R 
and map vertex v(C-) £ A to an arbitrary fixed vertex of C[ Hi?. By construction G' D R 7 ^ 0, 
so ip is well-defined. As G[i?] is connected there is a path between ip(x) and ip(y) in G[R ]: 
ip{x) = u\, U 2 , ■ ■ ■, Uh = ip{y)- By definition of a clique cover, any two consecutive vertices ue 
and ue+i [l £ [h — 1]) along this path are in a same clique C[. Therefore, in B[A U R] there is 
a corresponding path ue,v(Cl),ue+ 1 . Also ip(x) (resp. ip{y)) is either x (resp. y) or linked by an 
edge to x (resp. y). Overall, this gives a path from x to y in B[A U R\. 

Conversely, assume there is a solution S to Set S has to contain A otherwise the color 7 is 
not represented k 1 times. So, S = Al+ll?'. We claim that R' is a solution for the instance I. In order 
to match the colors of M', the colors of R' should match the multiset constraint of M. It remains 
to argue why G[R '] is connected. Let x, y be any two vertices of R'. Since B is bipartite and B[S} 
is connected, there is a path in B[S]: x = Ui, ^(G'.J, U 2 , f(G' 2 ), U 3 ,..., Uh-i, u(G' fei ), Uh = y with 
Ui £ R' for any I £ [h]. As ue and w+i are in the same clique v(C' ie ) they are linked by an edge 
in G[i?']. Thus, x = ui, U 2 , U 3 ,..., uh = y is a path in G[i?']. □ 

The correctness of the reduction crucially relied on the fact that every edge is fully contained in 
at least one clique of the cover. This would not be the case with a vertex clique cover (a partition 
of the vertex set into sets inducing cliques). In Section 13.31 we give a more complicated FPT 
algorithm parameterized by the vertex clique cover size (if such a cover is given in the input). It 
is not surprising that the edges going from one clique to another play an important role in the 
greater difficulty of the parameterized algorithm. 

Ganian [25], Theorem [2] and Theorem [5] prove that Graph Motif is in FPT if the parameter 
is the vertex cover number or the distance to clique. Therefore, the problem has a kernel for these 
two parameters [39]. Though, this does not imply that the size of the corresponding kernels is poly¬ 
nomial. We show that the corresponding kernels cannot be polynomial unless NP C coNP /poly. 
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Theorem 11. Unless IMP C coNP /poly, Graph Motif has no polynomial kernel when parame¬ 
terized by the vertex cover number or the distance to clique, even for (i) motifs with only 3 colors 
or (ii) when the motif is colorful. 

Proof. We only detail the proof for (i) for parameter vertex cover. We will define an OR-cross- 
composition [9] from the NP-complete X3C problem, stated as follows: given an integer q , a set 
X = {xi,X 2 , • ■ ■, X 3 q } and a collection S = {.Si,..., S|s|} of 3-elements subsets of X, the goal is 
to decide if S contains a subcollection T C S such that \T\ = q and each element of X occurs in 
exactly one element of T. Given t instances, (Xi,<Si), (X 2 , S 2 ), ■ ■ ■, {X t , St), of X3C, we define our 
equivalence relation 7 Z such that any strings that are not encoding valid instances are equivalent, 
and (Xi,Si), ( Xj,Sj ) are equivalent if and only if \Xi\ = \Xj\ and |5,| = |<Sj|. We will build an 
instance (G, c, M) of Graph Motif parameterized by the vertex cover number, where G is the 
input graph, c the coloring function and M the motif, such that there is a solution for Graph 
Motif if and only if there is an i £ [t] such that there is a solution for (X i: Sf). We will now 
describe how to build such instance of Graph Motif. The graph G consists of t independent 
nodes ri,r 2 ,--- ,rt- There are also 0((3q) 3 ) nodes s x , y , z , 1 ^ x < y < z ^ 3 q, with an edge 
between r,; and s x , y , z if and only if the 3-element subset {x,y,z} exists in S,. Finally, there are 
\Xi\ = 3 q nodes ay, 1 ^ i ^ 3 q, and there is an edge between ay and every subset s XtVtZ where 
Xi occurs (see Figure [3]). The coloration is c(ry) = 1, for all 1 < i < t, c(s x>ytZ ) = 2 for all 
1 ^ x < y < z ^ 3q, and c(ay) = 3,1 ^ i < 3q. The multiset M consists of 1 occurrence of the 
color 1, q occurrences of color 2 and 3 q occurrences of color 3. 



Figure 3: Illustration of the construction of G for parameter vertex cover, 
occurrence of color 1, q of color 2 and 3 q of color 3. 


Color 1 


Color 2 


Color 3 

The motif consists of 1 


It is easy to see that {s^^ll ^ x < y < z < 3g} U {ay|l < i ^ 3g} is a vertex cover for G (as 
its removal leaves an independent set) and that its size is polynomial in 3 q and hence in the size 
of the largest instance. 

Let us show that there is a solution for our instance of Graph Motif if and only if at least 
one of the (Xj,<Sj)’s has a solution of size q. 

Suppose that (. Xi,Si ) has a solution 71 of size q. We set R = {ry} U {s X}V}Z | {x,y,z} € 
Ti} U {ay|l < * < 3 q}. One can easily check that G(R] is connected and that c(R ) = M. 

Conversely, suppose now that there is a solution R C V such that G[i?] is connected and 
c(R) = M. Due to the motif, only one of the nodes r,; is in R and all nodes Xi are in R. We 
claim that there is then a solution 71 in (Xj,<Sj), where i is the index of the only node rj in R. 
We add in 71 the q sets {x, y, z} such that s x , y , z € R. Since R is a solution, the nodes s x , y ,z in 
R correspond to a partition of X; otherwise, one of the nodes ay would be disconnected. Then, 
% covers exactly all the elements of Xi. By the connectivity constraint, the q sets added in 7) all 
occur in the instance i such that r, € R. 

If the considered parameter is the distance to clique, one can consider the nodes ri, r 2 ,..., r t 
as a clique. The removal of {s XiViZ \l ^ x < y < z ^ 3g} U {ay |1 ^ i ^ 3 q} leaves one clique and its 
size is polynomial in the size of the largest instance. The correctness is the same as for parameter 
vertex cover number, as only one occurrence of color 1 is in the motif. 
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The second item (ii) of the statement can be proven similarly following the ideas of [51 Theorem 
6 ]. That is, the nodes s XtVtZ are duplicated q times, i.e. into nodes s l x , 1 ^ i ^ q, where 
c ( s x,y,z) = *> forcing to have at most q of such nodes in the solution. Also, the 3 q nodes areceive 
a fresh unique color (say with colors g + ltog+l + 3 q), forcing all of them to be in any solution. 
The nodes r\, r%, ■ ■ ■, ry are colored with color q + 1 + 3q + 1. 

□ 


3.3 Parameters with unbounded neighborhood diversity 

This section disproves the idea that Graph Motif is only tractable for classes with bounded 
neighborhood diversity. Indeed, we show that Graph Motif is in FPT parameterized by the size 
of a vertex clique cover or by the distance to co-cluster. The former algorithm creates a win/win 
based on Konig’s theorem applied to a bounded number of auxiliary bipartite graphs. The latter is 
simpler and uses as subroutines the algorithms parameterized by vertex cover number and distance 
to clique. 

In the Vertex Clique Cover problem (also known as Clique Partition), one asks, given 
a graph G = ( V , E) and an integer k, for a partition of the vertices into k subsets Cj ,..., C\ C V, 
such that Vi € [k], G[Ci ] is a clique. The set {Gi,... ,Ck} is called a vertex clique cover of G. 
The vertex clique cover number of a graph G is the smallest k such that G has an vertex clique 
cover of size k. This problem is equivalent to the Graph Coloring problem since a graph as 
a vertex clique cover of size k if and only if its complement is fc-colorable. Therefore, Vertex 
Clique Cover is unlikely to be in XP. However, if a vertex clique cover comes with the input, 
we show that Graph Motif is in FPT for parameter vertex clique cover number. One can notice 
that Graph Motif is NP-hard in 2-colorable graphs. This is a striking example of how easier 
can Graph Motif be on the denser counterpart of two complementary classes. 

To realize that vertex clique cover number has unbounded neighborhood diversity, think of 
the complement of a bipartite graph. The vertex clique cover is of size 2 but the neighborhood 
diversity could be arbitrary; for parameter distance to co-cluster, think of the complement of a 
cluster graph with an unbounded number of cliques. 

Theorem 12. Graph Motif can be solved in time 0*(k°^ k ' > ) where k is the vertex clique cover 
number, provided that the vertex clique cover is given as paid of the input. 

Proof. Let (G = (V, E), c, M) be the instance and suppose that the partition into cliques {Ci,..., Ck} 
of the graph G is given. We remove all the vertices whose color does not belong to M, since they 
cannot be part of a solution. Observe also that this can only decrease the vertex clique cover 
number. First, we guess in time 0*(2 k ) which of the cliques S = {C [,..., C' k ,} C {Ci,..., Ck} 
have a non-empty intersection with a fixed solution R , and we remove from G the cliques which 
are not in S. 

We denote by E(X, Y) the set of edges of E having one endpoint in X and the other in Y. We 
call transversal edge an edge in E{C}, Gj) with i ^ j £ [&']. Such a transversal edge is said to have 
type {i,j}- An inner edge is an edge which lies within the same clique C[ for some i € \k'\. As 
G[i?] is connected, one may observe that there is a set E c C E(G[R}) of k! — 1 transversal edges 
such that between every pair of vertices u, v £ R, there is a path made only of edges in E c and 
inner edges. Informally, E c is a spanning tree of the k! cliques of S seen as vertices (see FigureHJ). 
More precisely, the edges of E c form a subforest of G. We guess in time 0*(k ,2 ( k _1 ^) the type of 
each edge in E c . We denote by T c the corresponding set of k! — 1 types. 

One may first think of the tansversal edges of E c as a matching. Although two edges of E c 
leaving the same clique C[ can share the same vertex in C\. Actually this piece of information 
will prove useful for the algorithm to work. Therefore, we also guess in time 0*(B 2 (k'-i)) = 
0*((2k') 2k ) if two edges in E c of types {i, j} and {*,/}, happen to have a common endpoint. 
One can see it the following way: among the potentially 2{k' — 1 ) endpoints of the matching E c , 
we needed to find the correct partition into the classes of the equality relation. As R is a solution, 
M C c(C[ U...U C' k ,) holds. Therefore, it all boils down to finding k! — 1 transversal edges whose 


3 FPT ALGORITHMS, KERNELIZATION AND ETH-BASED LOWER BOUNDS 


11 




Figure 4: The cliques C[,C 2 ,... ,C' k ,, the edge interaction between C[ and C 2 , and the corre¬ 
sponding auxiliary bipartite graph B\^ when the multiset M contains a with multiplicity exactly 
one and c with multiplicity at least 2 (indeed, observe that the edge cc is present in B\ ,2 but not 
the edge aa). 


set of types is precisely T c and such that the multiset of colors of their at most 2(k' — 1) endpoints 
is included in M. 

For each type {i, j} G T c , we build the bipartite graph B t j = (Hi&Hj,F) where Hi (resp. Hj) 
are all the colors of the vertices of C[ (resp. C'). There is an edge in F between color c G Hi and 
color c' G Hj whenever there is a transversal edge of type {i,j} whose endpoint in C[ is colored by 
c and whose endpoint in C'j is colored by d. In the special case when c and c! is in fact the same 
color and that color appears only once in M, we remove the edge cc' from F. We indeed know 
that no solution will contain such a tranversal edge. We remove all the isolated vertices of every 
Bij. We also remove every vertex c G Hi from Bjj if there is a j 1 such that we have guessed that 
the transversal edges of type {i, j} and share a common point and c is not in the Hi of B 13 i 

(it was an isolated vertex). The rest of the algorithm is a win/win based on the classic Konig’s 
theorem which states that, in a bipartite graph, the size of a minimum vertex cover is equal to 
the size of a maximum matching. The core idea is that either there is a large diversity of colors 
for the endpoints of a transversal edge, and a suitable transversal edge can always be found at the 
end, or there is only a limited choice of colors for those endpoints and one can branch over those 
possibilities. By branching, we commit ourselves to find a transversal edge uv whose endpoint, 
say, u has a specific color c. In that case, we say that the endpoint u has its color fixed. In a first 
step, we will branch until the endpoints of all the transversal edges are fixed (or can always be 
fixed). In a second step, we will build a solution respecting the fixed colors. 

We distinguish two cases. Either, there is a matching Sjj in Bjj with at least 2k' — 3 edges. 
Then, for any multiset of colors M a C M of size at most 2k' — 4, there is an edge {c, c'j in Si j 
such that M 0 U {c, c'} C M. Indeed, since \Sij\ > \M 0 \, there is at least one edge of Sij whose 
endpoints are not colored by an element of M a . Recall also that there can be an edge between 
two vertices of the same color only if the multiplicity of that color in M is at least 2. Therefore, 
whatever the multiset M a C M of colors at the endpoints of the k' — 2 other transversal edges is, 
one can always find a transversal edge of type {i, j} colored by c and c' such that M 0 U{c, c'} C M. 
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Thus, we can forget about this particular transversal edge, and we say that the transversal edge 
of type {i,j} is abundant. 

Otherwise, there is a vertex cover of Bij with at most 2k' — 4 vertices. Note that a vertex 
c £ Hi (resp. Hj) in the graph B t j corresponds to choosing color c for the endpoint in C[ (resp. Cj) 
of the transversal edge of type {*, j}. Therefore, we branch on those at most 2k' — 4 possibilities 
of coloring one of the endpoints of the transversal edge of type {i, j}. 

This describes what we do when no endpoint of the transversal edge has its color fixed. Now, 
suppose we have a transversal edge of type {i, j} such that the color of the endpoint in, say, C[ is 
fixed to color c. If the number of neighbors of vertex c £ Hi in the graph Bi j is at least 2k' — 3, 
we declare this edge abundant and no longer care about this edge. Otherwise, if this number is 
at most 2k' — 4, we branch on the at most 2k' — 4 ways of coloring the endpoint in C' of the 
transversal edge of type {i,j}- 

Note also that when we fix the color of an endpoint in C[ of a transversal edge of type {i,j}, 
it also fixes the color of the endpoints in C\ of potential transversal edges of type {i,j'} which 
we have guessed to share a common endpoint (in C[) with the transversal edge of type {i,j}- 
Although, this potential set of transversal edges might very well be empty. After a branching of 
depth at most 2k' — 2 and arity at most 2k' — 4, we reach a situation where each transversal edge 
is either abundant or both its endpoints have fixed colors. We fix the colors of the endpoints of 
the abundant transversal edges (which are not fixed yet) in the following way. For each tree of the 
forest E c , we root them arbitrarily. We then consider an arbitrary parent of some deepest leaves. 
We fix the colors of the endpoints corresponding to this parent and all its children. We explained 
above why this is always possible. We iterate this until every vertex of this tree has its color fixed. 

Now, all the endpoints of the transerval edges have their color fixed. By guessing the set T c 
of types of the transversal edges and whether or not two transversal edges are incident, we have 
in fact guessed the shape of a forest that those edges constitute in the original graph G. For each 
tree of this abstract forest, we have to compute the actual transversal edges. At this point, a node 
in this tree is naturally labeled by a pair (clique,color) (C-, c). We associate a subset of vertices to 
a node of this labeled tree in a bottom-up fashion. Each leaf labeled by (C-, c) is associated with 
the subset J, )C C C[ of vertices colored by c (that is, Vu £ 6", u £ Ji tC <t=> c(u) = c). We associate 
each inner node labeled by (C', c) whose r children are associated with sets Jj 1)Cl , • • •, Ji r ,c r with 
the subset Jq c C C[ of vertices colored by c which have at least one neighbor in Ji hlCh for each 
h £ [r]. When the last node e of the tree gets its set J , this set is non empty if we have made all 
our guesses accordingly to solution R. We define e as the root of the tree. Now, in a top-down 
manner we find the corresponding transversal edges. We take in the solution an arbitrary vertex 
u £ J. In each set associated with a child of e we take arbitrarily a neighbor of u; and so on, 
up to the leaves. By construction, this is always possible. It is possible that while doing this 
process on two different trees of the forest, we take "twice” the same vertex in some C[. This 
can only help since the goal is not to exceed the multiplicities of M. Equivalently, we could have 
guessed the forest of transversal edges with the least number of connected components, to forbid 
this possibility. 

We summarize the algorithm. 

1) Guess the shape of the forest formed by a fixed subset E c of k' — 1 transversal edges ensuring 
the connectivity between the cliques in a fixed solution R. 

2) Win/win to properly guess the colors of the endpoints of E c : (a) either the variety of colors is 
more than enough and this color can be fixed arbitrary later, or (b) the are only few choices and 
one can branch. 

3) For each tree of E c , find the transversal edges: one bottom-up procedure to check if there is 
indeed a solution and one top-down to select the actual vertices. 

4) As I? is a solution, one can complete this to a solution by taking arbitrary (since everything is 
connected) vertices with the right colors. 

Observe that during step 2), we first do all the branchings advocated by (b). Then we reach a 
point when no further branching is possible, and we fix the colors arbitrarily as indicated by (a). 

The running time of the algorithm is O* (2 k k 2k ~ 2 (2k) 2k (2k — 4) 2k ~ 2 ) = O* ((4y/2k) ek ) = 

0*(k°W). □ 
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Theorem 13. Graph Motif can be solved in o* (f2 2klogk ), where k is the distance to co-cluster. 

Proof. Let (G = (V, E), c, M) be any instance of Graph Motif and let R be a solution. Let X 
be a minimum subset (of size k) whose deletion makes the graph G a co-cluster. Co-cluster graphs 
are exactly the P 3 -free graphs. A P 3 graph is a path with 3 nodes (a P 3 graph is its complement, 
thus one node and one edge). We can apply a bounded-depth branching algorithm by finding a 
P 3 and branching on which of the three vertices to put into the solution. This leads to an 0*{ 3 fc ) 
algorithm to find X. Let Si,S 2 ,.-,S q be the partition of the co-cluster graph G\V \ X) into 
maximal independent sets. The idea is to run the algorithm parameterized by the vertex cover 
number if at most one Si is inhabited by solution R , and the one parameterized by distance to 
clique otherwise. Therefore, we distinguish two cases: 

(A) |{*e[g] | PCS'* ^0}| < l, 

(B) \{ie[q] \ RnS t jI 0}| > 2. 

In case @ holds, we will find a solution by solving, for each i £ [g], the instance (G[X U 
<Sj], C|xuSi> Af). As A is a vertex cover of size k in GfAUS)], this can be done in time 0*( 2 2felogfe ) 
by Theorem [9] 

In case (0 holds, we can guess in time n 2 one vertex s £ Si HR and one vertex t £ SjOR with 
i j£ j £ [g]. Then, we will find a solution by solving (G' = (V \ {s, t}, E'),Cv\{ s ,t},M \ c({s, t})) 
where E' = (EU {{u,v} \ u,v £ S a ,a £ [<?]})|F\{s,t}- Indeed, if Y C U\{s,t} induces a connected 
subgraph in G', then G\Y U{s, t}] is connected. As G'— X is now a clique, this can be done in time 
0*( 3 fc ) by Theorem [5] The overall running time is 0*(3 fc + q 2 2klogk + n 2 3 fc ) = 0*(2 2fcl °g fc ). □ 

3.4 ETH-based lower bounds 

Here, we show that a parameterized subexponential algorithm (i.e., running in O*(2°^)) solving 
Graph Motif for the parameters k that we considered in this section, is unlikely. We get those 
negative results as a corollary of the fact that, while trying out all the subsets of vertices obviously 
solves Graph Motif in time 0*(2"), a subexponential time algorithm (i.e., running in 2°^) is 
unlikely: 

Theorem 14. Under ETH, Graph Motif cannot be solved in time 2°^ n \ even (i) on graphs 
with distance 1 to cluster, and (ii) on trees. 

Proof. Under ETH, Dominating Set restricted to graphs with degree 6 is not solvable in time 
2 °( ra ) where n is the number of vertices of the input graph (2H. From a degree -6 graph H and an 
integer t, we build an instance I = (G = (V) E), c:V—tC, M) of Graph Motif such that there 
is a dominating set of size at most t in H if and only if I is a YES-instance. First we show item 
(i). There are \V(H)\ + 1 different colors in C, one color c v for each vertex v of H, and one special 
color c. For each vertex v in H , we introduce a clique in G of size |!Vfj[u]| (^ 8 ) where one vertex 
is colored by the special color c, and the others are colored by each color of {c w \w £ iVjj[u]}. We 
add a vertex z colored by c and link it to all the other vertices colored by c (in the cliques). The 
motif M consists of c with multiplicity t + 1 and c v (for each v £ V(H)) with multiplicity 1. That 
ends the construction. Observe that the number of vertices of G is linear in \ V(H)\ (it is at most 
8 |U(IL)| + 1), and removing 2 from G gives a cluster graph of \V(H)\ cliques of size at most 8 
each. 

To obtain item (ii), G is transformed in the following way: each clique is replaced by a star 
where the center is the vertex with the special color c. 

Those reductions are identical to the reduction showing that Graph Motif is hard on trees of 
diameter 4 j2j (for (ii)) and Theorem Ell (for (i)), and therefore the reader is referred to paper [?! 
for correctness. □ 

Corollary 15. Under ETH, for every parameter upper-bounded by n, Graph Motif cannot be 
solved in time 2°( k \ even on trees. 
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Among the six parameters for which we gave an FPT algorithm, two are not upper-bounded 
by n but by n 2 : cluster editing and edge clique cover numbers. Though, we can observe that the 
graph built in item (i) of the proof of Theorem [TJ has both a cluster editing of size n (by removing 
the n edges between z and the n other vertices colored by c) and an edge clique cover of size 2 n. 
Therefore, for all the six parameters, a subexponential parameterized algorithm in 2would 
disprove ETH. 

We finally show finer lower bounds under SETH and SCH, for parameter vertex cover and 
distance to clique. In particular, Theorem [T3 implies that, even though there should be an 
algorithm solving Graph Motif in time 0*{c k ) with c < 8, and k being the distance to a clique, 
(thereby, improving over Theorem [8]), it is unlikely that c goes below 2. 

Theorem 16. Under SETH, for any e > 0, Graph Motif cannot be solved in time 0((2 — e) k ), 
where k is the vertex cover number. 

Proof. In the Hitting Set problem, one is given a set of sets S = {Si,..., S m } over elements 
X = {xi ,..., x n }, and an integer t, and one has to find a set X' C X (the hitting set) of size at 
most t such that VS £ S, S fl X' ^ 0. It is known that under SETH, for any e > 0, Hitting 
Set is not solvable in time 0 ((2 — e) n ) [16] . From any instance ( X,S,t .) of Hitting Set with 
n elements, we construct an equivalent instance (G = (V, E), c, M) of Graph Motif where the 
graph G has a vertex cover of size n. We create one vertex v(xf) for each element Xi of X and one 
vertex v{Sj) for each set Sj of S. The element vertices (the v(xi)’s) are colored by 1 and form a 
clique, while the set vertices (the v(Sj)’s) are colored by 2 and constitute an independent set. We 
link an element vertex to a set vertex if the corresponding element is in the corresponding set; 
that is, v(xi)v(Sj ) € E <t=> Xi £ Sj. Therefore, G is the adjacency split graph of the set-system 
(X, S) where the element vertices are the clique. M contains 1 with multiplicity t and 2 with 
multiplicity m. Observe that the set of all the element vertices is a vertex cover of G of size n. 

If X' = {x ai ,... ,x at } is a solution (potentially, add arbitrary elements to get a solution 
with exactly t elements) to the hitting set instance, then R := v(Sj) U (u(a; ai ),..., u(x at )} 

(obtained by taking all the set vertices and the t element vertices corresponding to the elements of 
X') satisfies the multiset constraint. Also, the subgraph G[R] is indeed connected by the definition 
of a hitting set, and the fact that (u(x ai ),..., v(x at )} is a clique. 

Conversely, let R C V be a solution for the constructed instance of Graph Motif. By the 
multiset constraint, R should contain all the vertices colored by 2: U,e[m] t(<Sj), and t vertices 
colored by 1: {u(x ai ), • • •, v{x at )}- We claim that X' ■= {x ai ,..., x at } is a hitting set (of size t). 
Indeed, if a set Sj was not hit by X ', then the set vertex v(Sj) would not be connected to the 
clique (u(x ai ),..., v(x at )}, and G[R] would have at least 2 connected components. □ 

Theorem 17. Under SCH, for any s > 0, Graph Motif cannot be solved in time 0((2 — e) k ), 
where k is the distance to clique. 

Proof. From an instance of Set Cover with n elements, we build an equivalent instance of 
Graph Motif where the distance from the graph to a clique is n. Again, we create one vertex for 
each element and one vertex for each set. The element vertices are colored by 1 and constitute an 
independent set, while the set vertices are colored by 2 and form a clique. We link each element 
vertex to each set vertex if the corresponding element is in the corresponding set. The graph is 
the adjacency split graph where the set vertices are the clique. M contains 1 with multiplicity n 
and 2 with multiplicity t. The removal of the set of all the element vertices (of size n) would leave 
a clique. The correctness of the reduction is similar to the one of Theorem [TB] □ 

4 Parameters for which Graph Motif is hard 

In this section, we provide several parameters for which Graph Motif is not in XP, unless 
P = IMP. In other words, the problem is NP-hard even for fixed values of the parameter. We also 
prove that the problem remains W[l]-hard for parameter max leaf number. Figure [l] summarizes 
these results. 
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4.1 Deletion set numbers 

We study parameters which correspond to the minimum number of vertices to remove to make 
the graph belong to a restricted class. We will show that Graph Motif remains NP-hard for 
constant values of those parameters. More precisely, the colorful restriction of Graph Motif 
is hard even if we can obtain a set of disjoint paths by removing 1 vertex, a cluster graph by 
removing 1 vertex, and an acyclic graph by removing 0 edge. 

Theorem 18 ({22]). Graph Motif is NP -hard even when G is a tree of maximum degree 3 and 
the motif is colorful. 

Corollary 19. Graph Motif is NP -hard even for graphs with feedback edge set number 0 and 
when the motif is colorful. 

Theorem 20. Graph Motif is NP -hard even (i) for graphs with distance 1 to disjoint paths and 
when the motif is colorful and (ii) for graphs with bandwidth 6 and when the motif is colorful. 

Proof. We will detail only (i). We propose a reduction from Exact Cover by 3-Sets (X3C). 
This special case of Set Cover is known to be NP-complete. Recall that X3C is stated as follows, 
given a set X = {aq,aq, ■ ■ ■, X 3 q } and a collection S = (Si,..., *S|s|} of 3-elements subsets of X, 
the goal is to decide if S contains a subcollection T C S such that each element of A' occurs in 
exactly one element of T. The size of X must be a multiple of three since a solution is a set of 
triplets where each element of X must appear exactly once. 

Let us now describe the construction of an instance X' = (G = (V. E ), c, M) of Graph Motif 
from an arbitrary instance X = (A ,S) of X3C (see also Figure [5]). The graph G = (V,E) is built 
as follows: there is a distinct root r, for each Si £ S , there are two paths built from r, the first 
one is made of a node a}, three nodes representing the elements in S t and a node bj, the other one 
is made of two nodes af and b?. The graph is thus a tree such that removing r gives a collection 
of 2|5| paths. 

The set of colors is C = {1,2,..., 2|<S| +3 q + 1}. The coloration of G is such that c(a{) = 
c(a?) = i and c(b}) = c(bf) = |S| + i for 1 ^ i ^ |<S|, the 3 q colors 2|5| + 1,..., 2|<S| + 3 q are 
assigned to vertices corresponding to A, and c(r) = 3g + 2|S| + 1. The motif is equal to the set of 
colors and is thus colorful. This construction is clearly done in polynomial time in regards of X. 



Figure 5: The graph G built from X = {aq, x%, ... , x§} (thus with q = 2) and S = 
{{aq, x 3 , x$}, {aq, x 2 , x 4 j, {aq, X 4 , a; 6 }, {aq, x 5 , a; 6 }}. 

Let us now prove that if there is a solution for an instance X of X3C, then there is solution 
for the instance X' of Graph Motif. Given a solution T C S for X, a solution P for X' is built 
as follows: take the root, for each ,S) £ T, take the whole path from a] to b]. and for each Si ^ T, 
take the path afbf. Informally speaking, for each set, either the set is in T and thus the path 
with the nodes corresponding to the elements is taken, otherwise the path with only two nodes is 
taken. By definition of a solution for X, each color 2|«S| + 1,..., 2|«S| + 3 q is taken only once, and 
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for each color 1,..., 2|<S|, exactly one of the two occurrences is taken. The root is also taken and 
thus the solution is connected. 

Conversely, let us now prove that there is a solution for the instance I of X3C if there is a 
solution for the instance I' of Graph Motif. First observe that the root r must be in the solution 
since it is the only node with this color. Also, for each 1 ^ ^ |<S|, either a\ or af must be in the 

solution since it is the only node with color i. The same holds for b\ and tf. Also, observe that if 
a\ is in the solution, then b\ must also be in the solution, with the three element nodes along the 
path. Indeed, if it is not the case, the color c(b]) will never be in the solution since the only other 
node with this color is . However, in order to add in the solution, a? must be in the solution 
to respect the connectivity constraint, which is impossible since c(aj) = c(af). Therefore, either 
the three element nodes corresponding to a set Si £ S are entirely in the solution P, or none are. 
The solution is built as follows: T = {Si : aj £ P}. Since P is a solution, colors of P appear 
exactly once. Therefore, each element of X appears exactly once in T. 

For (ii), we slightly modify the graph G. Instead of having one vertex r linked to each a\ (for 
i £ [|S|] and j £ [2]), we now have a path R = r\r\r\r 2 ■ ■ • r |s| r js|> and f° r each i £ [|<S|] and 

j £ [2], there is an edge between rj and aj. We call that new graph H. We may observe that H 
is a comb graph whose spine is R. The set of colors is now C = [4|<S| + 3g]. All the vertices in 
G — r keep the same colors, and for each i £ [|<S|] and j £ [2], c(rj) = 2|<S| + 3q + 2 (i — 1) + j. 
In other words, we give a fresh and distinct color to each vertex of R. Again, the motif M is the 
entire set of colors C. The correctness is the same as for (i), since all vertices of R must be in any 
solution because they are the only occurrences of their respective color. Since the maximal paths 
having exactly one vertex in the spine P, called teeth , are of length at most 6 , the bandwidth of 
H is bounded by 6 , too. Indeed, one can number the vertices increasingly tooth by tooth. A more 
careful analysis shows that the bandwidth of H is actually 5. □ 

Actually, one could also follow the reduction of [19] but start from a version of Sat where each 
literal appears in at most two clauses. This variant is also NP-complete, and the graph produced 
would have bandwidth 4. 

Theorem 21. Graph Motif is NP -hard even for graphs with distance 1 to cluster and when the 
motif is colorful. 

Proof. To prove this theorem, one can use the reduction from Colorful Set Cover to Graph 
Motif where the input graph is a tree of diameter at most 4 (called superstar) [2]. The idea is 
just to replace each subtree representing a set S» by a clique of size |£»| +1. Removing the root of 
the former superstar in this new graph yields a disjoint union of cliques and the rest of the proof 
carries over. □ 

4.2 Dominating set number 

Being given a small dominating set of the graph cannot help in solving Graph Motif. For any 
instance (G = ( V, E),c, M), one may add a universal new vertex v to G, and color it with a color 
which does not appear in motif M. The minimum dominating set {v} is of size 1. Vertex v cannot 
be part of the solution due to its color, so answering the new problem is as hard as solving the 
original instance. However, this could be considered as cheating since a vertex whose color is not 
in M can immediately be discarded from the graph. We show that even when \/v £ V, c(v) £ M, 
graphs with dominating set of size 2 can be hard to solve. 

Theorem 22. Graph Motif is NP -hard even for graphs with a minimum dominating set of 
size 2 and when the motif is colorful. 

Proof. We reduce from a rooted variant of Graph Motif, where the solution should contain a 
special vertex r. This variant was proven NP-hard by Ambalath et al. [2]. 

We will now prove that the problem remains hard with a small dominating set. The informal 
idea is to add a universal node u such that the dominating set is small, but with a gadget to 
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avoid the possibility of having this universal node in a solution (making the problem easy since 
any subset will be connected due to u). More formally, from any instance I = (G = (V, E), c, M), 
and any fixed vertex r in V, we build the instance X' = (G' = (V U {u,s,t},E'),c',M'), where 
E' = E U {{s, t}, {t, r}} U {{«, w} | w G V}, c'(w) = c(w) for each w G V, c'(t ) = c'(u) = x, 
c'(s) = y, with x and y being two distinct fresh colors, and M' = M U {x,y}. By construction, 
{u,t} is a dominating set in G’ of size 2. Let R be a solution of Graph Motif for instance X'. 
Vertex s is the only vertex with color y, so it has to be in R. But then, as the only neighbor of s 
is t (and \M'\ > 2), t should also be in R. Only one vertex with color x can be in R, so u cannot 
be part of the solution. Now, the problem is as hard as solving instance X rooted in r. □ 

4.3 Max leaf number 

The max leaf number of a graph G, denoted by rnl(G), is the maximum number of leaves (i.e., 
vertices of degree 1) in a spanning tree of G. Therefore, if G is itself a tree, rnl(G) is simply the 
number of leaves of G. We will first show that Graph Motif is in XP parameterized by max leaf 
number. The 77 , c> ( ml f G )) running time of our algorithm relies on a simple structural lemma that we 
state here: 

Lemma 23. Let G = ( V , E ) be a connected graph and S CV be the subset of all the vertices of G 
of degree at least 3. Then (S'! ^ 4 ml(G) and G[V \ S] is a disjoint union of at most 5 ml{G) paths. 

Proof. The first part of the lemma (|S'! ^ 4 ml(G)) is already known [31] . Let us now prove the 
second part: G[V \ S'] is a disjoint union of at most 5 ml{G) paths. 

As G is connected, we can find s—1 paths P 1; ..., P s _i of G[V\S] such that G[SUPiU.. .UP s _i] 
is connected, where s is the number of connected components of GfS 1 ]. Therefore, we build the 
following spanning tree of G: we start by taking the edges of any spanning forest of G[S I ], plus 
all the edges incident to at least one vertex of a path P* (for i G [s — 1]). Now, all the remaining 
paths in G[V \ S'] will provide (at least) one leaf each. As s ^ |S| ^ 4 k, if the number of paths 
in G\V \ S] were larger than 5 k, then we could exhibit a spanning tree with at least k + 1 leaves, 
which is a contradiction to k = ml(G). □ 

On the negative side, we will prove that Graph Motif is W[l]-hard with parameter max leaf 
number, which is to the best of our knowledge, the first problem to exhibit such a behavior. In 
fact, we will even prove that it is W[l]-hard on trees with parameter number of leaves in the tree 
plus number of distinct colors in the motif. This strengthens the previously known result that the 
problem is W[l]-hard on trees with parameter number of distinct colors in the motif 122] . 

Theorem 24. Graph Motif can be solved in time O*(16 /c n 10fe ) = n°^, where k = ml(G) and 
is in W[P] with respect to that parameter. 

Proof. Let (G = (V, E),c, M) be any instance of Graph Motif, k = ml(G), and S the set of 
vertices with degree strictly greater than 2 in G. Again, we may assume that G is connected and 
also that G is not a cycle, since otherwise Graph Motif is trivially solvable in time 0(n 2 ). 

It is known that |<Sj ^ 4 k (even 4 k — 2) [3T]. First, we can exhaustively find in time 2 4fe = 16 fc 
the intersection T = S HR, where R is a fixed solution. By definition, V \ S are vertices of degree 
at most 2. In particular, G[V \ S’] is a disjoint union of paths (some of the paths may consist of a 
single vertex). Indeed, there cannot be a cycle in G\V \ S'] since this cycle could not be connected 
to the rest of G. By Lemma l23l the number of paths in G[V \ S] is at most 5 k. 

To satisfy the connectivity constraint, solution R can intersect each of the at most 5 k paths of 
G[V \ S] in at most n 2 different ways (more precisely in at most ( 2 ) + l + 1 where l is the number 
of vertices in the path). So, we can guess the intersection R D (V \ S) in time (n 2 ) 5fc = n 10k . 
Overall, we can decide Graph Motif in time O* (16 k n 10k ) where k is the max leaf number. 

We can also show that Graph Motif parameterized by ml(G) is in W[P] with the character¬ 
ization of this class by Turing machines with bounded non-determinism H3j. □ 

Theorem 25. Graph Motif is W[l] -hard with respect to the max leaf number plus the number 
of colors, even on trees. 
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Proof. We show the stronger result that Graph Motif is W[l]-hard on subdivisions of the star 
Ki,k with parameter k + \C\ where C is the set of colors. From any instance H = (Hi l±l... ttliffc, E) 
of the W[l]-hard problem Multicolored ^-Clique, we construct an equivalent instance (T = 
(V, E'), c : V —>■ C, M) of Graph Motif where T is a tree with k + (fj) + 1 leaves and C consists 
of ( 2 ) + 3 colors. More precisely, T is a subdivision of the star with k + (!)) + 1 leaves. We recall 
that the Multicolored /c-Clique problem asks for a fc-clique in H hitting each Hi (exactly 
once). By potentially adding some isolated vertices, we can assume that each Hi contains the 
same number t of vertices, and Hi = {uij, ..., Uij}. 

The set of colors C is {c 0 ,Cf,,c e } U Ui<ye[fe]{*j} (|C| = ( 2 ) + 3). The motif M contains c 0 with 
multiplicity 1 , both Cf, and c e with multiplicity s := k(t — 1 ) + (^)t 2 , and for any i < j £ [k], 
color ij with multiplicity t 2 . We write M = {lx co, s x Cb,s x c e } U U i<je[k]{^ 2 x *.?}> with the 
convention that mul x col means color col appears in the multiset with multiplicity mul. 

The tree T is a subdivision of a star with k + (fj) + 1 leaves whose center v is the only vertex 
colored by co- Thus, v should necessarily be in any solution. By construction, T[V \ {u}] is a 
disjoint union of k + ( 2 ) +1 paths. We can think those paths as oriented from the vertex neighbor 
of v (the first vertex of the path) to the vertex the farther away from v (the last vertex of the 
path). We will extensively call those paths oriented paths. By this, we only mean something 
informal about a potential solution growing from v along those paths, and we do not mean that 
the graph we build is directed. For each i £ [k], a path Pi will correspond to the vertices of Hi and 
for any pair i < j £ [k], a path Pij will encode the edges of Eij := E(Hi,Hj). Additionally, we 
have a path Pb e with 2s vertices alternating color Cf, and c e ; the first vertex of the path is colored 
by Cb, the second by c e and so forth. 

Before we describe the P^s and the Pi j s, we introduce the notion of block and indicate a useful 
property that the construction will satisfy. A block is a subpath of an oriented path which starts 
with a vertex colored by Cb (as begin), ends with a vertex colored by c e (as end), and such that 
no internal vertex in the subpath has color c& or c e . The path Pb e can be seen as s consecutive 
empty blocks. We may also observe that two different blocks of the same oriented path cannot 
intersect. We will construct the P,s and the Pij s such that they are entirely spanned by blocks; 
and we call that alternating property. Therefore, every vertex except v is contained in a (unique) 
block. In particular, each oriented path Pi or Pij has its first vertex colored by Cf, and its last 
vertex colored by c e . And, if we only consider vertices colored by Cb and c e along the path, they 
alternate Cb — c e Cb — c e ... with the extra property that there is no vertex between color c e and 
Cb (see Figure 0. A connected subgraph of T containing v (i.e., a potential solution) is entirely 
defined by k + ( 2 ) + 1 stopping points : one for each oriented path Pb e , Pi, or Pij. A stopping 
point of an oriented path P with respect to a given (attempt of) solution R is the farthest vertex 
from v lying in R (~l P. Observe that the unique path from v to a stopping point is exactly the 
intersection of the solution and the oriented path. If PflP = 0, by convention, the stopping point 
is v. It is easy to see that, in each oriented path Pf, e , Pi, or Pij , a stopping point relative to an 
actual solution is either v or a vertex colored by c e (that is the end of a block). Put differently, if 
R is a solution and B is a block, PflP = 0orPnP = P. Indeed, if it is not the case, because of 
the alternating property, the chosen connected subgraph would contain at least one more vertex 
colored by Cb than colored by c e , and would not satisfy the multiset constraint. Therefore, within 
a block, the order of the internal vertices does not matter. 

We now describe the path Pi for each i £ [k]. The oriented path Pi consists of t, — 1 copies 
of the same block Bi put one after the other. The internal vertices of P,; consist of one vertex 
colored by li for each l £ [i — 1] and t vertices colored by ij for each j £ [i + 1, k] (see Figure 0. 
We may recall that the order of the internal vertices of a block is irrelevant. Notice also that the 
P,s depends only on the number t of vertices per Hi. As Pi is made of t, — 1 blocks, there are t 
stopping points, and, intuitively, the q -th stopping point corresponds to taking as part of the 
multicolored clique in H. As a slight overload of notation, we will also denote by the g-th 
stopping point of path Pi. By convention, Ujj is v. 

To motivate the definition of the Pij s, we need to explain how we can think pairs of Hi x Hj 
as integers of [0,t 2 — 1]. Say, the stopping point of a given solution R is in Pi for some 
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Figure 6 : Illustration of the global construction and the alternating property. Color c& is rep¬ 
resented in green (light gray) and c e in red (dark gray). The stopping points just precede the 
vertical cuts. 
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Figure 7: The oriented paths Pi, Pj, and Ij.j. Again, color c& is represented in green (light gray) 
and color c e in red (dark gray). Note that the PiS do depend only on the number t of vertices per 
color class, while Pij actually encodes the adjacency between Hi and Hj in some flattened form. 


q £ [0, £ — 1], and Uj tq >+1 in Pj for some q' £ [0, t — 1] (with i < j). The number of vertices colored 
by ij contained in R D (Pi U Pj) is tq + q'\ this number corresponds to a unique pair of stopping 
points. Indeed, function <f> : x £ [0, t 2 — 1] i-A (\x/t\,x mod t) € [0,t — 1] x [0,£ — 1] is bijective 
since |_a; / £j and x mod t are the quotient and the remainder of the euclidean division of x by t. 

For any i < j £ [fc], the oriented path Pij consists of \Eij\ blocks whose internal vertices 
are all colored by ij . We define three auxiliary lists of \Eij\ integers each, indexed from 1 to 
\Eij\. The third list will correspond to how many vertices colored by ij we put in the \Eij\ 
consecutive blocks. The first list Aij contains, in the increasing order, every integer x £ [0, £ 2 — 1] 
such that if <j>(x) = (q,q'), it holds that Ui, q +iUj jq > +i e Eij. Intuitively, it is the sorted list of 
integers in [0, £ 2 — 1] which are edges of Eij. The second list L ? ; j contains, in the increasing 
order, all the integers t 2 — x such that x £ Aij. The easiest way to obtain Lij from Aij is 
to complement to t 2 each integer in A i :j which yields a list sorted in decreasing order, and to 
reverse the result. The third list Di j is defined by D i} j[ 1] := Li j[ 1] and for every h £ [2, |_Ej j|], 
Dij[k\ = Lij[h\ — Lij[h — 1]. Finally, for every h £ [|7?ij|], the h -th block of Pij gets Dij[h] 
vertices colored by ij (see Figurc'0. This ends the construction of the instance of Graph Motif. 

Suppose there is a multicolored clique C := {u\ m ,... ,Uk, qic } in H. We construct a solution 
R to the produced instance ( T,c,M) in the following way. For each i £ [fc], the stopping point 
of R in path Pi is Ui, Qi . For any pair i < j £ [fc], let yij := t 2 — </> _1 ((?i — 1, qj — 1), and let hij 
be the index such that yij = Lij[hij ]. The stopping point of R in path Pij is right after its 
hij -th block. The subtree induced by those k + (fj) stopping points contains the same number z 
of vertices colored by C& and of vertices colored by c e . As 2 is non-negative and cannot exceed s, 
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solution R can and will stop after s — z blocks in Pf, e , thereby fulfilling the multiset constraint for 
colors Cb and c e . By construction (from vertex v along the oriented paths), R induces a connected 
subgraph. 

What remains to be seen is that hij is well defined and that, for each i < j £ [k], R contains 
exactly t 2 vertices colored by ij. A preliminary easy observation is that vertices colored by ij 
only appear in three oriented paths: Pi, Pj and Pj j. For any pair i < j £ [fc], as C is a clique, 
u i,qi u j,qj £ Eij. Thus, the value (f>^ 1 (qi — 1, qj — 1) is in Ajj, and so, yij = t 2 — 4>~ 1 (qi — 1, qj — 1) 
is in Lij. This means that h,j exists. Also, by definition of <j>, <p~ 1 (qi — 1, q 7 — 1) corresponds 
to the number of vertices colored by ij in R fl (Pi U Pj). Therefore, t/,;. 7 is exactly the number 
of vertices colored by ij we want to have in R fl Pij. As we stop R in P l: j after hij blocks, 
the number of vertices colored by ij in I? fl Pij is Xu ^r^h itj Di,j[r\. And, = 

(£2<r-</u,.jM — Lij[r — 1]) + Lij[ 1] = Lij[hij] = yij. Hence, the total number of vertices 
colored by ij in R is — 1, qj — 1) + yij = t 2 . 

Now, suppose that there is no multicolored clique in H. We will show that there cannot be a 
solution to the instance of Graph Motif. For the sake of contradiction, we assume that R is a 
solution. As explained during the construction, vertex v has to be in R and the stopping points in 
each oriented path Pi , Pij, and P& ie should coincide with the end of blocks. In particular, in each 
Pi, the stopping point of R should be a vertex Ui iq . Thus, let Mi l9l ,..., Uk, qk be the stopping points 
of R in Pi,..., Pfc. As there is no multicolored clique in H, there exists at least one pair i < j £ [ k ], 
such that Ui tq( Uj tq} £ Eij. Let h be the number of blocks in RnPij] in other words, R stops in Pi j 
after h blocks. We now show that R cannot contain exactly t 2 vertices colored by ij, and hence, is 
not a solution. The number of vertices colored by ij in R fl (Pj U Pj) is 4>^ 1 (qi — 1, qj — 1) ^ Aj j. 
As x £ [0,f 2 — 1] i—»• t 2 — x £ [t 2 ] is bijective, it means that t 2 — (f>~ 1 (qi — 1 ,qj — 1) ^ Pjj. 
Besides, the number of vertices colored by ij in R fl Pij is Ei^ r ^ h Dij(r]. We observed in the 
previous paragraph that Lij[h] = £i ^r^hDij[r}. Hence t 2 — 4>^ 1 (qi — l,qj — 1) ^ Si ^ r ^hDij[r], 
so <(> -1 (<?i - 1 ,qj - 1) + ^ t 2 . □ 

As it is usually the case with FPT reductions from Multicolored /c-Clique using edge 
representations the parameter goes from k to 0(fc 2 ). Thus, concerning running-time lower bounds, 
the previous reduction only shows that solving Graph Motif in time n °(V ml ( G )+l c l) would also 
solve Multicolored fc-CLiQUE in time n°^ which is known to disprove ETH, and even imply 
that FPT = W[l] [Hj. Nevertheless, we can strengthen this lower bound by performing the 
same reduction from Partitioned Subgraph Isomorphism. In the Partitioned Subgraph 
Isomorphism problem, one is given two graphs H and G. The vertices of graph H are partitioned 
into |V(G)| classes C v one for each vertex v of G. The goal is to find an injective mapping h : 
V(G) —> V(H) such that if uv € E(G), then h(u)h(v) £ E(H ), and for each v £ V(G), h(v) £ C v . 
Under ETH, Partitioned Subgraph Isomorphism cannot be solved in time n °( fc / 10 s fc ) where 
k is the number of edges of the smaller graph G [35]. Observe that we can ignore isolated vertices 
in G (we are looking for a subgraph not an induced subgraph). Thus, the number of edges in G 
is at least \V(G)\/2, and ETH even implies that Partitioned Subgraph Isomorphism cannot 
be solved in time n °(*/i°gk) where k = |U(G)| + \E(G)\. 

The reduction from Graph Motif to Partitioned Subgraph Isomorphism encode the 
graph H partitioned into the C v s but only introduce a color ij and a path P, j if there is an edge 
in G between the i -th and the j-th vertex. The number of leaves in T is |V(G)| + |-E(G)| + 1 and 
the number of colors of C is |jE 7(G)| + 3. Thus, we get that, under ETH, Graph Motif cannot 
be solved in n°^ ml ( G ) + l c ^/ lo g( ml (G)+|C|))_ Therefore, our algorithm running in time 7 j,°( m i(c?)) 
probably optimal up to logarithmic factors in the exponent. 

The Graph Motif problem on subdivisions of stars can be reformulated as the following 
problems on words: given a set of k + 1 words w\,..., Wk, and w over an alphabet E, find 
w[,..., w’ k , such that for each i £ [k], w( is a prefix of Wi, and the concatenation .. ■ w’ k is 

an anagram of w. Indeed, hard instances of Graph Motif on subdivisions of stars are such that 
the center of the subdivided star should necessarily be in a solution (otherwise, the whole solution 
is entirely contained in an induced path, and can be computed in polynomial time). Then, letters 
correspond to colors, w to the multiset M, and the uy’s to the words formed by the colors of 
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the vertices in each oriented path. Therefore, Theorem entails that this problem is W[l]-hard 
parameterized by k+ |£| (number of words plus size of the alphabet). However, as far as we know, 
this problem has not appeared in the literature. 

We may finally observe that Graph Motif on paths is an established string problem going 
by the name of jumbled pattern matching (see for instance m- In this problem, one has to find, 
given a string and a Parikh vector (or multiset of letters), a substring whose occurences of letters 
match the Parikh vector. Therefore, Graph Motif can be seen as a generalization of this string 
problem to more complex structures. 


5 Conclusion and open problems 

Figure [T| sums up the parameterized complexity landscape of Graph Motif with respect to 
structural parameters. For parameter maximum independent set the complexity status of Graph 
Motif remains unknown. Even when the problem is in FPT, polynomial kernels tend to be 
unlikely; be it for the natural parameter even on comb graphs [2] or for the vertex cover number 
or the distance to clique (Theorem If ID . Is it also the case for parameter cluster editing number? 

On the one hand, we saw that our algorithm running in 0*( 3 fc ) for parameter distance to clique 
is probably close to optimal, since 0*((2 — s) k ) is unlikely. On the other hand, for parameter vertex 
cover number, for instance, we have a larger room for improvement between the 2°( felog O-upper 
bound and the 2°( fc l-lower bound under ETH. Can we improve the algorithm to time 2°^ k \ or, on 
the contrary, show a stronger lower bound of 2°( fcl ° sfe ) (potentially with the framework developed 
by Lokslitanov et al. [36])? 

A possible future work would be to see if the FPT algorithms presented in the article can be 
extended to the more general List Graph Motif, where a vertex can choose its color among a 
private list of colors, without damaging too much their running time. 

Finally, one could consider more restricted versions (when, for instance, the number of colors, 
or the maximum multiplicity of the motif, or the maximum number of occurences of a color in 
the graph, is bounded). This line of work is sometimes called multi-parameter analysis, where one 
seeks for FPT algorithms with respect to subset of parameters. Let us recall, as an example, that 
Graph Motif is in XP if the parameter is the treewidth of the graph plus the number of colors 
in the motif [22] . 
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